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ABSTRACT 

We present the results of a search for damped Lyman-o- (DLA) systems in the Sloan Digital Sky Survey II (SDSS), Data Release 7. We 
use a fully automatic procedure to identify DLAs and derive their column densities. The procedure is checked against the results of 
previous searches for DLAs in SDSS. We discuss the agreements and differences and show the robustness of our procedure. For each 
system, we obtain an accurate measurement of the absorber's redshift, the H i column density and the equivalent width of associated 
metal absorption lines, without any human intervention. We find 1426 absorbers with 2.15 < z < 5.2 with logA'(Hl) > 20, out of 
which 937 systems have logA'(Hl) > 20.3. This is the largest DLA sample ever built, made available to the scientific community 
through the electronic version of this paper. 

In the course of the survey, we discovered the intervening DLA with highest H i column density known to date with log N(ll l) = 
22.0 ±0.1. This single system provides a strong constraint on the high-end of the A'(Hl) frequency distribution now measured with 
high accuracy. 

We show that the presence of a DLA at the blue end of a QSO spectrum can lead to important systematic errors and propose a method 
to avoid them. This has important consequences for the measurement of the cosmological mass density of neutral gas at z ~ 2.2 and 
therefore on our understanding of galaxy evolution over the past 10 billion years. 

We find a significant decrease of the cosmological mass density of neutral gas in DLAs, from z = 4 to z = 2.2, consistent with 

the result of previous SDSS studies. However, and contrary to other SDSS studies, we find that Q^^'^iz = 2.2) is about twice the value 
at z = 0. This implies that fl^^'^ keeps decreasing at z < 2.2. 

Key words, cosmology: observations - quasar: absorption-lines - galaxies:evolution 
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1. Introduction dWolfe et alJll986h . Not only does this correspond to a conve- 
(•"^ nient detectability limit in low-resolution spectra, but also to a 
Despite accounting for only a small fraction of all the baryons ^jj^j^^j surface-density limit for star formation. Since DLAs are 
_ > in the Universe (see 'Fukug ita & Peebles| | 2004|) , the neuti-al and ^^^y ^^^^^^ ^j^^ ^^ ^ ^^^^^^ densities can be measured ac- 
>^ molecular phases of the interstellar medium are at any redshift curately, it is possible to measure the cosmological mass density 
^ the reservoir of gas from which stars form. Therefore, determin- ^f neutral gas at different redshifts, independently of the exact 
mg the cosmological mass density of neutral gas (Qhi) and its ^^e absorbers, provided a sufficiendy lai-ge number of 
evolution in time is a fundamental step forward towai-ds under- background quasars is observed. However, any bias affecting the 
standing how galaxies form dKlypin et al.| | 1995D . selection of the quasars or the determination of the redshift path- 
In the local Universe, neutral gas is best traced by the hy- jgngth probed by each Une of sight can affect the measurements, 
perfine 21 -cm emission of atomic hydrogen. Its observation al- 
lows for an accurate measurement of the neutral gas spatial The Lick survey, the first systematic search for DLAs, 
distribution in nearby galaxies and st rongly constrains the col- led to the detection of 15 systems at (z) 2.5 
umn density frequency disti-ibution (IZwaan et al.l l2005bh and along the hne of si ght to 68 quasars dWolfe et alj 119861: 
i^Hi(z = 0) (Zw aan et al.. 2005a). However, the Umited sensitiv- iTurnshek etakl 119891; Iwolfe et all Il993h . About one hundred 
ity of current radio telesc opes prevents direct detections of Hi quasars were subseq uently surveyed for DLA absorptions by 
emission beyond z ~ 0.2 dLah et al.l l2007 HVerheiien et al.l l2007t iLanzetta et al.' (1991). A number of surveys have followed (e.g;^ 
ICatinella et a Tl l2008l) . Wolfe et al. 1995; Lanzetta et al. 1995; Storrie-Lombardi et^ 
At high redshift, most of the neutral hydrogen is revealed by 1996a b; Storrie-Lombardi & Wolfe 2000; iEUison et alj l200lF 
the Damped Lyman-a (DLA) absorption systems detected in the Peroux et al. 2003; Rao et al., .2005. .2006) . each of them con- 
spectra of background quasars. While most of the gas is hkely to tributing significanfly to increase the number of known DLAs 
be neutral for log A^(H i) > 19.5 (Vie gaslll995h . the conventional and the redshift coverage. On the other hand, surveys at low and 
definition for Damped Lyman-a systems is log A^(H i) > 20.3 intermediate redshifts are difficult because they require UV ob- 
servations an d the number of confirmed systems is building up 
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In turn. iPeroux et alj ( 1200 U l2003h aimed at the highest red- 
shifts by observing 66 quasars with Zem > 4. They included data 
from previous surveys in their analysis, which led to the largest 
DLA sample available at that time. They found no significant 
evolution of the cosmological mass density of neutral gas over 
the redshift range 1 < z < 4 and suggested that a significant part 
of this mass is due to systems with neutral hydrogen column 
densities below the conventional DLA cutoff limit. 

The most recent contribution to the actual census of DLAs 
is a semi-automatic data mining of thousands of quasar spectra 
from the Sloan Digital Sky Survey (P rochaska & Herbert-Fort 
120041: |Prochaskaetani2005t iProchaska & Wolfel 120091) reveal- 
ing more than 700 new DLAs thus increasing by one order of 
magnitude the number of known DLAs at z > 2.2. Key results 
include the indication that the A^(H i)-frequency distribution de- 
viates significantly from a single power-law. Its shape is found 
to be nearly invariant with redshift, while its normalisatio n does 
change with redshift. In their work, IProchaska & Wolf8 found 
that the incidence of DLAs and the neutral gas mass density in 
DLAs (QI?^'^) decrease significantly with time between z ~ 3.5 
and z = 2.2. The mass density in DLAs at z ~ 2. 2 is claimed 
to be c onsistent with that measured at z — by IZwaan et al.l 
(I2005bl). However, th is is difficult to reconcile with the result 
from Rao et al] ( l2006l) that Qg'''^ stays constant over the range 
1 < z < 2 and then decreases strongly to reach the value derived 
from 21 cm observations at z = 0. 

iRao et all s results using HST could be questioned because 
DLAs are not searched directly but rather are selected on the 
basis of strong associated Mg ii absorption. There is indeed an 
excess of high col umn densities i n the HST sample compared to 
the SDSS sample ( iRao et aLll2006 ) that could be real or induced 
by some selection bias. On the other hand, the SDSS measure- 
ment could be affected by some bias due to the limited signal-to- 
noise ratio at the blue end of the spectra corresponding to z ~2. 
Motivated by the importance of these issues on our understand- 
ing of galaxy formation and evolution, and by the discrepancy 
in the results of different surveys, we developed robust fully au- 
tomatic procedures to search for, detect and analyse DLAs in 
the seventh and last data release (DR7) of QSO spectra from the 
Sloan Digital Sky Survey II. We present our methods and the 
algorithms used to analyse the data in Sections |2]and[3] the sta- 
tistical results on the neutral gas column density distribution and 
cosmic evolution of fi^^^ in Section |4l with a special emphasis 
on discussing systematic effects. We conclude in Section|5] 



2. Quasar sample and redshift path 

The quasar sam ple is d rawn from the Sloan Digital Sky Survey 
Data Release 7 (lAbaza iian et al. 2009) and includes every point- 
source spectroscopically confirmed as quasar (specClass=QSO 
or HIZ_QSO). Basically, SDSS quasars are pre-selected either 
upo n their colours, avoiding the stellar locus dNewberg & Yannvl 
|1997), or from matching the FIRST radio source catalogue 
(Becker et al. 1995, see Richards et al. 2002 for a full descrip- 
tion of the quasar selection algorithm in SDSS) and then con- 
firmed spectroscopically. 

Given the blue limit of the SDSS spectrograph (3800 A), we 
selected quasars whose emission redshift is larger than Zem - 
2.17, with a confidence level higher than 0.9. This gives 14616 
quasar spectra that were retrieved from the SDSS websiteQ. 



' |http : //www ■ sdss . org | 



2.1. Redshift path 

The first step is to define the redshift range over which to search 
for DLAs along each line of sight. As in most DLA surveys, 
we define the maximum redshift, Zmax, at 5 000 kms"' blue- 
wards of the QSO emission redshift. This is most ly to avoid 
DLAs located in the vicinity of the quasar (e.g. Ellison et al.l 
2002). For defining the minimum redshift, we note that Lyman- 
limit systems (LLS) prohibit the detection of any absorption fea- 
ture at wavelengths shorter than the corresponding Lyman break 
{A < (l-i-ZLLs)X't9i2)-Toensure the minimum redshift, z'Jjjjj, is set 
redwards of any Lyman break possibly present in the spectrum, 
we run a 2000 km s"' -wide (29 pixels) sliding window starting 
from the blue end of the spectrum and define z^j^ as the centre of 
the first window where the median signal-to-noise ratio exceeds 

4. 

Th is definition is very similar to that of IProchaska et al.l 
(| 200 5l) and guaranties robust detections of DLAs in spectra of 
minimum SNR. 3347 quasars with z^^j^^ > Zmax were obviously 
not considered any further Note that the actual minimum red- 
shift can be affected by the presence of a DLA located by chance 
at the blue limit of the wavelength range. Given the importance 
of this effect, we postpone its fuU discussion to Section 4.2. 



2.2. Quality of the spectra 

It is very difficult to control the search for DLAs in spectra of 
poor quality. Therefore, the spectra of bad quality should be re- 
moved a priori from the sample. For this, we must use an indica- 
tor of the spectral quality in the redshift range to be searched 
for, [z|Jjjjj,Zmax], that does not depend upon the presence of a 
damped Ly-a absorption, otherwise we may introduce a bias 
against DLA-bearing lines of sight. 

We estimated the quality of the spectra - independently of 
the presence of DLAs - by measuring the median continuum-to- 
noise ratio in the redshift range [z|J,j,,,Zmax] defined above. The 
continuum over the Lyman-o- forest (i.e. the unabsorbed quasar 
flux) is estimated by fitting a power law to the quasar spectrum, 
including the wavelength range 1215.67x[l + z|Jj;^, 1 + Zmax] in 
the blue of the Ly-a emission and regions free from emission 
lines in the red. We ignore the range 5575-5585 A which is af- 
fected by the presence of dead pixels in the CCD. Deviant pixels, 
mainly due to Ly-a absorption lines in the blue and metal absorp- 
tion lines in the red are first ignored by using Savitsky-Golay 
filtering. Then we iteratively remove deviant pixels by decreas- 
ing their weight at each iteration. This procedure converges very 
quickly. A double Gaussian is then fitted on top of the Ly-a-i-N v 
emission lines to reproduce the increased flux close to Zmax- The 
noise is taken from the error array. Quasar spectra with median 
continuum-to-noise ratio lower than four were not considered 
any further. We are then left with 9597 quasar spectra. 



2.3. Broad absorption line (BAL) quasars 

Broad absorption lines from gas associated with the quasar can 
possibly be confused with DLAs. Our purpose here is not to 
recognise all BAL quasars but rather to automatically select the 
quasars without strong BAL outflows in order to avoid contami- 
nation of the DLA sample by broad lines, i.e., O vi, H i and N v. 

Therefore outflows were automatically identified by search- 
ing for wide absorpti ons (extended oyer a f ew thousand kilome- 
tres per second, see IWevmann et al] 1 199 lb close to the quasar 
Si IV and/or C iv emission lines. 



p. Noterdaeme et al.: Evolution of the cosmological neutral gas mass density 



3 




1300 



1400 1500 
Rest-frame wavelength (A) 



1600 



Fig.l. Spectrum of the quasar SDSS J104109.86+001051.76 
featuring broad absorption lines. The fit to the quasar continuum 
is shown by the thick line. The thin line represents a decrement 
of 20% in the quasar flux. Pixels satisfying the BAL criteria in- 
side the running windows (see text) are marked with blue dots. 



For this, we computed the normalised spectrum HiA) in the 
region Aobs =[1350,1550]x(l + Zem) as the ratio of the observed 
spectrum to the continuum derived following the procedure by 
Gibson et al. (2009). The quasar continuum was modelled by 
the pr oduct of the SDSS composite spectrum ( V anden Berk et all 
1200 lb with a third order polynomial. This allows to reproduce 
well the combination of reddening and intrinsic shape of the 
quasar spectrum, as well as the overall shape of the emission 
lines with a very limited number of parameters. However, the 
exact shape of the Si iv and C iv emission lines is accurately re- 
produced only when a Gaussian is added at the position of the 
lines. We also adjusted the emission redshift by cross-correlating 
the reddened composite spectrum with the observed one in the 
red wings of the Si iv and C iv emission lines, the blue wings be- 
ing possibly affected by BALs. An example of continuum fitting 
is shown in Fig.[T] 

We excluded the quasars whenever 'R(A) is continuously less 
than 0.8 over 1000 km s"', or 'RiA) < 0.8 over at least 75% of a 
3000 km s ' wide window, running between 1350 and 1550 A in 
the quasar's rest frame. With this definition, we are most sensi- 
tive to broad absorption lines with balnicity indexes (BI) larger 
than 1000 kms"'. The balnicity index (BI) characterises the 
strength of a BAL (see e.g. Gibson et al. 2009). Note that the core 
of a damped Ly-a absorption will be larger than this value, so 
that possible broad Hi, O vi and/or N v lines from systems with 
BI < 1000 km s ' have little chance to mimic a DLA. The pro- 
cedure excludes 1258 BAL quasars among the 9597 quasars left 
after the previous steps, i.e. after checking for adequate signal- 
to-noise ratio and redshift range available along the line of sight. 

We have checked this automatic procedure by comparing our 
list o f rejected quasars w ith the catalogue of SDSS BAL quasars 
from lTrump et al.l (l2006h. Fig.[2]shows the b alnicity index distri- 
bution of all quasars in lTrump et alj (l2006i) (unfilled histogram) 
and that of the quasars our procedure excludes (red shaded his- 
togram), the difference is shown as a blue shaded histogram. It 
can be seen that for BI > 500 km s ' our procedure misses very 
few BAL QSOs. We have checked that the missed BAL QSOs 
have no strong O vi. Hi or N v absorption line that could mimic 
a DLA. 

We are thus left with the spectra of 8339 QSOs, without 
strong BALs and with sufficient signal-to-noise ratio to search 



1000 



600 



200j 



X3 




4000 8000 
BI (km s ') 



12000 



Fig. 2. Balnicity index (BI) distribution of the quasars in com- 
mon with the BAL catalogue of Trump et al. (2006) (solid black 
line histogram). The BI measurements are from these authors. 
The red right-dashed histogram corresponds to the quasars iden- 
tified as BAL by our procedure, while the blue left-das hed his- 
togram represents the distribution of the iTrump et al.l quasars 
that are not removed by our procedure. Very few BAL quasars 
with BI > 500 km s ' are missed by our automatic procedure. 



for damped Lyman-a systems. We call this sample 5*^^^ 



Note 



that the study of iProchaska & Wolfd (l2009l hereafter PW09) us- 
ing DR5 is based on 7482 QSOs. 



3. Detection of DLAs and A^(Hi) measurements 

DLA candidates are generally identified in low-resolution spec- 
tra by their large Ly-a equivalent widths (Wi- > 10 A) and are 
then confirmed b y higher spectral resolution observations (e.g. 
IWolfe et al.|[T995l) . However the resolving power of SDSS spec- 
tra (7? - A/ 6 A ~ 1800) is high enough to detect the wings of 
damped Lyman-a lines. Therefore, it is possible not only to iden- 
tify these lines but also to measure the corresponding H i column 
densities from Voigt-profile fitting. Note that, because R is con- 
stant along the spectrum, the pixel size is constant in velocity- 
space and DLA profiles of a given A^(H i) are equivalently sam- 
pled by the same number of pixels, regardless of their redshift. 

The number of candidates in the SDSS is so large 
that the detection te c hnique s must be automatised. 
IProchaska & Herbert-Forj (|2004 searched for DLA candi- 
dates in SDSS spectra by running a narrow window along the 
spectra to identify the core of the DLA trough as a region 
where the signal-to-noise ratio is significantly lower than the 
characteristic SNR in the vicinity. Candidates were then checked 
by eye and the Hi column density measured by Voigt-profile 
fitting interactively. 

We develop here a novel approach which makes use of all the 
information available in the DLA profile and, most importantly, 
is fully automatic. 
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3.1. Search for strong absorptions 

The technique is based on a Spearman correlation analysis. At 
each pixel / in the spectrum, corresponding to a given redshift 
for Lyman-a, synthetic Voigt profiles corresponding to different 
column densities (A^(Hi)j) differing by 0.1 dex are successively 
correlated with the observed spectrum over a velocity interval 
[Vmin,Vinax] Corresponding to a decrement larger than 20% in the 
Voigt profile. Each redshift (zi) for which the Spearman's corre- 
lation coefficient is larger than 0.5 with high significance (> 5<t) 
is recorded. We then add the criterion that the area between the 
observed (Fobs) and synthetic (Fsy„t) profiles is less than the in- 
tegrated error aiTay (Fobs) on the interval [Vniin,Vniax]: 



obs 



^synt)^ ^ 



-■obs 5 



(1) 



where (Fobs - Fsym)^ = (Fobs - Fsynt) if (Fobs > Fsynt) and 
otherwise. This definition allows the DLA line to be blended 
with intervening Lyman-o- absorbers. 

The (zi,A^(Hi)j) pair with highest correlation for each DLA 
candidate is then recorded. This provides a list of DLA candi- 
dates with first guesses of A^(H i) and Zabs- 



3.2. Fits of Lyman-a and metal iines 

For each candidate, we perform, in the vicinity of the candidate 
redshift Zi, a cross-correlation of the observed spectrum with 
an absorption template representing the most prominent low- 
ionisation metal absorption lines (Cn/il334, Siii/11526, 
Al 11/11670, Fe 1608, 2344, 2374, 2382, 2586, 2600, 

Mg 11/1/12796,2803). The template is a variant of a binary 
mask (similar to those used to derive stellar radial velocities, 
see e.g. Baranne et al. 1996), where each absorption line is 
represented by a Gaussian with a width matching the SDSS 
spectral resolution (see top panel of Fig. O. We restrict the mask 
to the metal lines expected in the red of the QSO Ly-a emission 
line to avoid contamination by Ly-ff forest lines. We then obtain 
a sharp cross-correlation function (CCF, Fig. [3]) which is itself 
fitted with a Gaussian profile to derive a measurement of the 
redshift with an accuracy better than 10"^. Voigt profiles are 
fitted to each metal line after determination of the local contin- 
uum, providing a measure of their equivalent widths. In case 
of non-detection of low-ionisation lines, the same procedure is 
repeated to detect Civ and Siiv absorption lines. Finally, when 
no metal absorption line is detected automatically, we keep the 
redshift obtained from the best correlation with the synthetic 
Ly-a profile (zj, see Section ITTl ). 

A Voigt-profile fit of the damped Lyman-ff absorption is then 
performed to derive an accurate measurement of A^(H i) (bottom- 
right panel of Fig. |3]l, taking as initial value the guess on A^(H i) 
from the best synthetic profile correlation and fixing the redshift 
to the value derived as described above. Absorption lines from 
the Ly-a forest are ignored iteratively by rejecting deviant pixels 
with a smaller tolerance at each iteration. 

For each system, we obtain an accurate measurement of the 
absorber's redshift, the Hi column density and the equivalent 
width of associated metal absorption lines, without any human 
intervention. We found 1426 absorbers with logA^(Hi) > 20, 
among which 937 have logA^(Hi) > 20.3. This is the largest 
DLA sample ever built. The distributions of H i column densities 
and redshifts for the whole sample is shown on Fig.|4] 
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Fig. 3. Example of the SDSS spectrum of 
J 152529.18+292813.18 with a DLA line at /lobs ~ 3880 A 
(top panel). The mask (template of low-ionisation absorption 
lines) used to derive an accurate redshift (z^bs - 2.1850) from 
the cross-correlation function (shown in panel CCF with the 
origin of the pixel scale set from the first guess) is overplotted 
in blue in the upper panel. The automatic fits to a few metal 
absorption lines are shown in the left-hand side and middle pan- 
els (Cn/11334, Sin/11526, Feii/11608, Aln/11670). Rest-frame 
equivalent widths and associated errors are indicated (in A) 
at the bottom of each panel. The automatic Voigt-profile fit to 
the damped Ly-ff line is overplotted to the observed spectrum 
in the bottom-right panel. The shaded area corresponds to the 



uncertainty on the column density (A^(H i) = 10 



20.56±0.21 



cm 



3.3. Accuracy of tlie measurements and systematic errors 

Direct comparisons can be performed between the sample of 
PW09, derived from SDSS Data Release 5, and the correspond- 
ing sample from our survey. Indeed, one would like to assess 
the completeness of each sample and the reliability of the corre- 
sponding detection procedures. 

For this, we consider only systems that are detected along 
lines of sight covered by both surveys. We compare the lists 
of sys tems with logA^(Hi) > 21 from this work and from 
IPW09L We check whether logA^(Hi) > 21 systems from a 
given survey are detected in the other survey, whatever the 
column density estimated in the second survey is. The limit on 
A^(Hi) is chosen high enough so that no system is excluded 
because of errors. In addition, these systems are the most 
important for the analysis because, as we will show, they 
contribute to more than one half of Q^^^. Only one such DLA 
(at Zab s ^ 3 .755 towards J130259.60-H433504.5) among 76 
in the |PW09| list has been missed by our procedure, due to 
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Fig. 4. Top: Histogram of Hi column densities for the 1426 ab- 
sorbers automatically detected in SDSS DR7. The vertical dotted 
line marks the traditional DLA threshold value of log A^(H i) - 
20.3. Bottom: Histogram of absorption redshifts for the same 
sample. 
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Fig. 5. Distribution of the difference in the column den- 
sity measurements from the two surveys, AlogA^(Hi) = 
log A^(H i)this work - log A^(H i)pwo9, for the whole sample (black), 
z < 3 (blue), 3 < z < 3.5 (orange) and z > 3.5 (red). 
Distributions are corrected from the log A^(H i) = 20.3 truncat- 
ing effect (see text). The non-corrected distribution for the whole 
sample is represented by the dotted histogram. 



spurious lines leading to a wrong CCF redshift measurement. 
This means that our completeness at logA^(Hi) > 21 is 
about 99%. In turn, we discovered 6 logA^(Hi) > 21 DLAs, 
wit h redsh ifts in the range z - 2.3-3.8, that are not in 
the |PW09l list when they should be since the corresponding 
lines of sight have been considered and the redshift of the 
DLA covered. These are J100325.13-H325307.0 (zabs=2.330); 
J205509.49-071748.6 (zabs=3.553); J093251.00-H090733.9 
(Zabs=2.342), J133042.52-01 1927.5 (zabs=2.881); 

J092914.49-H282529.1 (Zabs=2.314) and J151037.18-H340220.6 
(Zab,s=2.323). Associated metal lines are detected for five of 
them. 

The completeness at smaller A^(H i) is more difficult to esti- 
mate because of the uncertainty of individual measurements. We 
compared however our detections to that of |PW09l and found 
that we recover more than 96% of their systems with most of 
the DLAs missed having logA^(Hi) ~ 20.3. We also checked 
visually one hundred randomly-selected DLAs and found that 
about 3% of the systems in our sample are false-positive detec- 
tions (1% at z < 3.2 and 7% at z > 3.2). The completeness of 
both samples ( |PW09l and ours) is sufficiently high to have little 
influence on the determination of the cosmological mass density 
of neutral gas. 

Next, we compare the A^(Hi) measurements for the same 
systems in the two samples. Fig. |5] shows the distribution of 
the difference in the DLA column densities, AlogA^(Hi) = 
log A^(H i)this work - log A^(H i)pwo9, for the whole sample (black), 
z < 3 (blue), 3 < z < 3.5 (orange) and z > 3.5 (red). 
These distributions are corrected from the truncating effect at 
logA^(Hi) - 20.3, i.e. from the fact that some systems may 
have logA^(Hi) > 20.3 in one sample but logA^(Hi) < 20.3 
in the other sample. Only systems for which the opposite of 



A log A^(H i) is also allowed are considered. The uncorrected dis- 
tribution for the whole sample is shown as a dotted line. 

The dispersion in the whole sample is about 0.20 dex and 
matches the typical error on individual A^(Hi) measurements. 
There is a small systematic difference that increases with redshift 
from less than 0.03 dex at z < 3 to 0.05 dex at z > 3.5, which 
is likely due to the increasingly crowded Ly-a forest at higher 
redshift. 

4. Results 

In this section, we present the statistical results from our DLA 
survey. Here, we use the standard definitions for the different 
statistical quantities. 
The absorption distance X is defined as 

X(z)= fd+z'f-^dz', (2) 
Jo H(z') 

where Hq is the Hubble constant and H(z) = 
Ho [(1 + z^)D.^ - (1 -H zf(D.^ -H Qa - 1) -H Q^]'^^ The cos- 
mological mass density of neutral gas, Qg is given by 

^n.(X)dX ^ ^^^^ r"^" N(Ui)fMX}dX, (3) 

CPc JiV„„„ 

where fiii{N,X) is the A^(Hi) frequency distribution (i.e. 
/h i(A^, X)dNdX is the number of systems within (A^, + dN) and 
{X, X + dX)), jj. - 1 .3 is the mean molecular mass of the gas and 
Pc is the critical mass density. Setting Nmm = 2 x 10^" cm"^ and 
Mnax = °° gives Qg^'^, the cosmological mass density of neutral 
gas in DLAs. Since at the column densities of DLAs the gas is 
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neutral, this is also the total mass density of the gas in DLAs. In 
the discrete limit, D^^^ is given by: 



DLA _ f^miiHu men I) 



cpc 



AX 



(4) 



where the sum is calculated for systems with log A^(H i) > 
20.3 along lines of sight with a total pathlength AX. We adopt 
a ACDM cosmology with Qa = 0-7, = 0.3, and Hq = 
70 kms-i Mpc-i (e.g. lSoergel etani2003l) . 



4.1. Sensitivity function 

We present in Fig.|6]the sensitivity functions g{z), i.e. the number 
of lines of sight covering a given redshift, for the different DLA 
surveys discussed in this paper. 

The redshift sensitivity of SDSS is significantl y larger than 
that o f the largest QSO compilation prior to SDSS ( Peroux et al.l 
|2003|) at any redshift larger than 2.2. The highe r sensit ivity 
(~ 30%) of our SDSS sample compared to that of |PW09l over 
the redshift range z ~ 2.2 - 3.5 reflects the increase in the num- 
ber of observed quasars between the two data releases (DR5 and 
DR7). At z > 3.5, the sensitivity of the PW09 quasar sample 
is higher than that presented here. This is due to (i) the choice 
by these authors to exclude only 3000 km s ' from the emission 
redshift of the quasar whereas we exclude 5000 km s"' to avoid 
the proximity effect, (ii) the slightly strongest constraint on SNR 
we impose when including quasar spectra in our s urvey, (iii) the 
inclusion of quasars with some BAL activity in the lPW09l sample 
and (iv) the fact that we restrict our quasar sample to those with 
confidence on the redshift measurement higher than 0.9. Very 
few quasars are detected in SDSS at z > 4 and the determination 
of Q^^^ at these red shifts would benefit of dedicated surveys 
(see lGuimarae s et al."2009'). We note that among the ten Zem ^ 5 
quasars in tiie PW09 sample, J 165902.12+270935.1 was spec- 
troscopically mis-classified as 'galaxy' instead of 'QSO' by the 
SDSS while the redshift confidence for J 075618. 13-H410408.5 
is smaller than 0.9. These lines of sight were therefore not in- 
cluded in our quasar sample (see Sect. |2]l. Furthermore, five of 
these quasars have been rejected from our statistical sample, ei- 
ther because of BAL activity (Sect. |2.3l l or because of a low 
mean signal-to-noise ratio of the spectrum (Sect. |272] i. 



4.2. Importance of systematic effects 

With the large number of quasar spectra available in SDSS, we 
reach a level at which systematic efi'ects become more impor- 
tant than statistical errors. In particular, the statistical results of 
the survey are very sensitive to the determination of the total ab- 
sorption distance AX. 

The presence of a DLA absorption line significantly atten- 
uates the quasar flux and decreases the signal-to-noise ratio of 
the spectrum. Therefore, if a DLA line is present at the blue end 
of the spectrum, the minimum redshift considered along the line 
of sight prior to any search for DLA absorption can be overesti- 
mated. The corresponding redshift range is rejected a priori be- 
cause of the presence of the DLA. The immediate consequence 
is that the presence of a strong absorption can preclude its inclu- 
sion in the statistical sample (see Fig.|7)- We expect this effect to 
be important for large A^(H i), when Aula is close to the blue-end 
of the spectrum (at 3800 A), and when the signal-to-noise ratio 
of the spectrum is low. Note that the bias we describe here can 
affect all DLA surveys. 



this work (no corr.) 
this work (corr.) 

PW09 

Peroux et al. (2003) 




Fig. 6. Redshift sensitivity function g(z) of the different DLA 
surveys considered in this paper. The black and red curves rep- 
resent respectively the sensitivity of our survey without (6v = 
km s"') and with (6\ - 10 000 km s"') correction of the edge- 
effect bi as (see Sect.l4.2[l. T he green curve is from the sample 
built by I Peroux et al.l (|2003|) where as the orange dashed curve 
represents the DR5 sensitivity from iProcha ska & Wolfe (200^. 
The vertical dotted line corresponds to z = 1.65, which is the 
redshift below which the Ly-a line cannot be observed from the 
ground because of the atmospheric absorption. 



In order to assess the severity of this effect, we artificially 
added damped Ly-a absorptions to the spectra of quasars from 
sample -S^^q- Column densities are in the range logA'(Hi) = 
20.3 -22 and redshifts in the range 2.2 < z < 2.4. We applied our 
automatic procedure to define Zmin and compare the minimum 
redshift obtained along each line of sight with (z^^^) and with- 
out (z^jj,) the DLA. For each set of values (Zabs, logA^(Hi)), we 
calculated the fraction of DLAs that are missed because of their 
influence on the redshift path (z^^fj^ > Zabs while z|J,;„ < Zabs)- 
This fraction indicates the severity of the bias. The result of this 
exercise is shown on Fig. [8] It is clear from this figure that there 
is indeed a severe bias against the presence of DLAs at the blue 
end of the spectra. This bias increases, as expected, with decreas- 
ing redshift and increasing column density. 

We propose to circumvent this problem by adding a system- 
atic velocity shift 6v to z° defining Zmin as: 



+ ^(l+z' 
c 



,0 

min 



(5) 



We therefore a priori exclude part of the spectrum at the blue 
end. This decreases the total pathlength of the survey but guar- 
antees, for a sufficiently large Sv, that any pathlength that would 
be excluded in the case of the presence of a DLA is a priori 
excluded from the statistics. In other words, the definition of 
[Zmin - Zmax] wiU not depend upon the presence of a DLA in 
this redshift range and the bias will be avoided a priori. 

Note that in their survey, Prochaskaet al. (2005) are aware 
of this effect and apply a shift of 6v - 1 500 kms"' which they 
claim is sufficient to avoid the bias considered as minor. They 
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3800 3900 4000 4100 4200 _ 4300 4400 4500 

Observed wavelength (A) 

Fig. 7. Example of the effect of the presence of a DLA on the 
determination of Zmin- Because of the DLA absorption near the 
blue end of the spectrum (thick vertical line), the minimum red- 
shift along this line of sight is set by the automatic procedure 
redwards of the absorption (and the redshift range considered 
is marked by an horizontal segment). If the DLA had no effect 
on the determination of the minimum accessible redshift, then 
the latter would have been set to the blue end of the spectrum. 
The consequence is that the DLA is missed, while it should have 
been included in the sample (see Sect, \4.2i . Note that the re- 
ality of the DLA is confirmed by the detection of metal lines 
whose positions are indicated by vertical dotted lines. The spec- 
trum shown in this figure is that of J16213L46H-234550.8. We 
propose a procedure to avoid this systematic effect (see Text). 



also restrict their analysis to z > 2.2 so that the very blue end 
of the spectrum is not considered. However the wings of a DLA 
can significantly lower the signal-to-noise ratio over a velocity 
range as large as several thousands kilometres per second. For 
example, a log A^(H i) ~ 21 DLA lowers the SNR by more than 
10% (and obviously up to 100% in the core of the profile) over 
a velocity range of about 10 000 km s ' . We therefore expect the 
bias to be corrected only for large values of 6v. 

We performed the same test as described above with 6v - 
2 500,5 000,7 500 and 10 000 kms"'. The different panels of 
Fig. |9] give the results. It is clear that the bias is still quite 
strong for 5v = 2 500 kms"' and almost vanishes for 6v - 
lOOOOkms-i. 

We will see in the following that the systems with 
log A^(H i) ~ 2 1 .3 contribute most to the total cosmological mass 
density of neutral gas. The residual bias must therefore be well 
below 10% for this kind of column densities. This is the case 
with 6v - 10000 kms"': only systems with very large column 
densities (log A^(H I) > 21.9) andredshifts below2.3 have > 20% 
probability to be missed. As we expect about one such system in 
the whole SDSS survey and at any redshift, the remaining bias 
is well below the Poissonian statistical error. Applying a 5v shift 
larger than 10 000 km s ' would therefore be useless and would 
unnecessarily decrease the total pathlength of the survey (see 
Fig.©. 

In order to verify that the bias indeed affects the results from 
IPW09L we searched their quasar sample for DLAs at z = 2.2-2.4 
that their procedure missed. To flag these systems, we could 
have searched for strong Mg ii absorption lines that ha ve bee n 
shown to be good tracers of DLAs (e.g. lRao & Turnshekll2000l) . 
but the corresponding absorption lines are unfortunately red- 
shifted either beyond the SDSS spectrum (z > 2.28) or at its 
very red end, where the quality of the spectra becomes very 
poor. We therefore automatically searched for strong Feii ab- 
sorpti on lines instead. Ou t of 57 systems detected this way, 32 
are in iProchaska & Wolfef s statistical sample and 25 have been 




100% 



20.5 



21.0 21.5 
log N(HI) 



Fig. 8. Result of a simulation to estimate the severity of the bias 
(defined as the percentage of DLAs missed; see Text) due to the 
incorrect pathlength determination resulting from the presence 
of a DLA at the blue end of the spectrum as a function of the 
redshift and neutral hydrogen column density of the absorber. 
The colour scale is given at the top of the figure. Additionally, 
the dotted, dashed-dotted and dashed lines represent the 5, 10 
and 15% contours, respectively. 
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Fig. 9. Same as Fig. [8] when applying different corrections. From 
left to right and top to bottom: (Jv = 2 500, 5 000, 7 500 and 
10 000 km s ' . Colours and line styles are as per Fig. [8] 



missed because the minimum redshift is set redwards of the DLA 
due to the decreased signal-to-noise ratio. Few examples of such 
DLAs are given in Fig.fTOl 

In the following, we will apply the dv cut to avoid the bias 
(850 bona-fide DLAs are then left in the statistical sample) and 



8 



P. Noterdaeme et al.: Evolution of the cosmological neutral gas mass density 




3800 4000 4200 4400 ^ 4600 48( 3800 4000 4200 4400 ^ 4600 48( 
Observed wavelength (A) Observed wavelength (A) 

Fig. 10. Sp ectra o f four DLAs with Zabs < Zmin in the 
sample of lPW09l The vertical lines show the positions 
of the Ly-a absorptions and the horizontal segment corre- 
sponds to the redshift pathlength (as defined by these au- 
thors) probed along each line of sight. In all cases, the pres- 
ence of metal lines confirm the DLA. From left to right and 
top to bottom: J092322.86H-033821.5, J155556.90-I-480015.0, 
J084006.65+362531.6 and J095604.44+344415.5. 

Table 1. Parameters of the fits to the A^(H i) frequency distribu- 
tion (see Fig.fTTTi. 



Double 


power law 




r function 


kd 


-23.09 


kg 


= -22.75 


A'd = 


21.27 




21.26 




-1.60 




-1.27 


adi 


-3.48 







correct our statistical results for the reliability of our DLA sam- 
ple (~ 93% at z > 3.2). Note that, while the first correction will 
have important consequences, the second correction only has a 
minor effect on the Q^^^{z) results. 



4.3. Frequency distribution 

In Fig.im we present the A^(H i) frequency distribution function 
fHi(N,X) of the whole sample. Vertical error bars are represen- 
tative of Poissonian statistical errors while horizontal bars repre- 
sent the log A^(H i)-binning (by steps of 0.1 dex). We find that a 
double power-law (e.g. PW09.) . 



/hi(A^,X) = 



k,{^y" for N<N, 

A:d(ff'' for N>Nd 
or a F-function (e.g. lFaU & P"eilll993HPiroux et al.ll2003h . 



(6) 



fH,{N,X) = kg 



(7) 



fit the data equally well (;t'J = 1.1 and 0.7, respectively). The best 
fit values of the parameters are summarised in Table[T] Slight dif- 
ferences between the F-function and double power-law fits (see 
also next Section) are not statistically significant. 
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Double power law 
r function 

z=0 (Zwaan et al. 2005) 
PW09 
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log N(HI) 
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Fig. 11. A^(H i) frequency distribution of damped Lyman-a sys- 
tems in SDSS-DR7 from our automatic procedure. Fits to the 
observations by a single power law, a double power law and a 
gamma function are given as, respectively, a dotted blue, dashed 
green and solid red line. The double-power law fit to the IPW09I 
sample is indicated by the dashed orange li ne. The F- function fit 
to the frequency distribution obtained by Zwaan et al. (20051^1) 
from 21 -cm observations at z = is also indicated as a solid 
grey line for direct comparison. 



The slope of the distribution is found to be a ~ -1.6 for 
A^(H i) < 2 1 .4, which is close to what is expected from models of 
photo -ionised gas in hydrostatic equilibrium (e.g.'Petitiean et al 



119921). This is flatter than what is found by Prochaska et aL. 

(2005;, a ~ -2 f or their whole sample (see also pwoC 

Peroux et alj ( |2005l) already mentioned that the slope of the 
frequency distribution at A^(H i) around the conventional DLA 
threshold is flatter than -2. 

The slope of fm{N, X) at large A^(H i), a x -3.5, implies that 
systems with very large column density are very rare. We find a 
slope much flatter than |PW09| (a ~ -6). This is probably due 
to our definite detection of the first DLA with log A^(H i) - 22, 
at Zabs = 3.286 towards SDSS J081634+144612. We obtained 
high spectral resolution data for this quasar with UVES in April 
2008. The column density, measured from the UVES spec- 
trum, is logA^(Hi) = 22.0 + 0.1 (see Fig. [T2li while the col- 
umn density derived automatically from the SDSS spectrum is 
logA^(Hi) - 21.92 + 0.19. This is the absorber with the lai-gest 
column density observed to date along a quasar line of sight. 
Such a column density is similar to tha t of DLAs detected at 
the redshift of Gamma-Ray Bursts (e.g. [Vreeswiik et al.ll2004l: 
iJakobsson et al.ll2006t iLedoux et al.ll2009l) . Detailed analysis of 
this system will be presented in a future paper. Note that from 
the fit of the frequency distribution, no more than one system 
like this one is expected in the whole SDSS survey. 

The shape of the A^(H i) frequency distribution at high col- 
umn densities suggests that there is no abrupt transition between 
neutral hydrogen and molecular hydrogen in diffuse clouds as 
ad vocated by (Sch aye 2001). This is supported by the results 
of IZwaan & Prochaskal (l2006h who used CO emission maps to 
show that the H2 distribution function is a continuous exten- 



p. Noterdaeme et al.: Evolution of the cosmological neutral gas mass density 



9 



1.2 r 




-10000 10000 

V (km s'") 



Fig. 12. Damped Lyman-o' absorption line at Zabs - 3.286 to- 
wards SDSS J081634.40+144612.86. The neutral hydrogen col- 
umn density, measured from the Voigt profile fit to the smoothed 
(7 pixel boxcar) UVES spectrum, is log A?(H i) = 22.0+0.1. This 
is the highest H i column density ever measured along QSO lines 
of sight. 



sion of fiii(N,X) for high column densities. In addition, there is 
only a small tendency for increasing H2 molecular fraction with 
larger H i column density (iLedoux et alj2003l : lNoterdaeme et all 
I2008ah . 

4.4. Convergence of Qj?^^ and the contribution of 
sub-damped Lyman-a systems 

One major issue when measuring the cosmological mass den- 
sity of neutral gas in DLAs is the convergence of Qg at large 
log A^(H i) values. An artificial cut at large A^(H i) was frequently 
introduced to prevent the integration of a single power-law to di- 
verge. This was justified by small number statistics at the high- 
est column densities. The SDSS allowed PW09 to observe for 
the first time that the slope of fiii{N,X) is steeper than -2 for 
logA^(Hi) > 21.5, directly demonstrating that Q^^^ converges. 

Figure [13] presents the cumulative cosmological mass den- 
sity of neutral gas in DLAs as a function of the maximum H i 
column density from data in our study and from the different fits 
to the frequency distribution. It is apparent that converges 
by logA^(Hi) = 22. 

A change of inflexion in the frequency distribution is ap- 
parent at logA^(Hi) ~ 21. This is best seen on Fig. [T4l which 
gives the slope of the above quantity as a function of log A^(H i). 
In other words, the area below the curve represents the contri- 
bution of the different intervals of A^(H i) to the total H i mass 
density. It is apparent that systems with very large (resp. low) 
column densities contribute little to the census of neutral gas be- 
cause of their paucity (resp. low column density). The most im- 
portant contribution comes from DLAs with logA^(Hi) ~ 21.2. 
A simple extrapolation of the F-function fit to column densi- 
ties smaller than logA^(Hi) - 20.3 shows that sub-DLAs, with 
19 $ logA^(Hi) < 20.3, contribute about 20% of the mass 
density of neutral hydrogen at z ^ 2.2. Extrapolating the dou- 
ble power-law fit gives a sub-DLA contribution to Oj^' of about 
30%. 

Although both the power law and the Gamma function are 
good fits to fHi(N,X), it can be seen on Figs. [TTIandfT?! that the 
slope of fHi(N,X) could be smaller at the low end (log A^(H i) ~ 
20.3) jPeroux eta l. 2005; Guimaraes et al. 2009). The gamma 
function could best reproduce this regime. Furthermore, the dou- 
ble power-law produces a spike seen at logA^(Hi) = 21.3, that 
is not present in the data. This is due to the arbitrary and some- 
what unphysical discontinuous change in the slope of the double 
power-law fit. 

It is interesting to note here that the contribution of the dif- 
ferent column densities to il^ ' at high redshift is very similar 
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Fig. 13. Cumulative cosmological mass density of neutral gas in 
DLAs as a function of maximum column density. The apparent 
flattening of the curve at log A^(H i) ~ 2 1 .7 implies convergence. 
Double power-law and F functions are equally possible solutions 
while a single power-law diverges and is not representative of the 
data at high column densities. 

to what is observed in the local Universe dZwaan et al. I l2005bl) . 
This indicates that the H i surface density profile of the neutral 
phase at high-z is not significantly different from that observed 
at z = 0. 

4.5. Evolution witli cosmic time 

In this Section, we study the evolution over cosmic time of the 
cosmological mass density of neutral gas, Qj?''^. This evolution 
is the result of several important processes involved in galaxy 
formation including the consumption of gas during star forma- 
tion activity but also the consequences of energy releases dur- 
ing the hierarchical building up of sys tems from smaller blocks 
(e.g. Ledoux et al. 1998; H aehnelt et al. 1998), or the ejection of 
gas from the central parts of ma ssive halos into th e intergalactic 

me dium throug h galactic winds ('Fall & Pei|l993l). 

IStorrie-Lo mbardi & Wolfe (2000) and IPeroux et al.1 (|2003|) 
claimed an increase of Q^^^ when z decreases from z ~ 3 to 
z ~ 2. This is due to the lack in their sample of high column 
density DLAs a t high redshift. From their SDSS DLA search, 
iProchaska et al.l ( l2005h observe on the contrary a significant de- 
crease of n^^^ between z ~ 4 and z ~ 2. They interpret this 
as the result of neutral gas consumption by star formation activ- 
ity and/or the ejection of gas into the intergalactic medium. The 
value they derive for Qg'"^ at z = 2.2 is almost equal to that at 
z = 0, indicating very little or no evolution of the cosmological 
mass density of neutral gas over the past ten billion years (see 
iPWQ3)- We argue here that the differences between these results 
are artificial and can be reconciled within errors at least up to 
z~ 3.2. 

On the one hand, the redshift pathlength probed along SDSS 
lines of sight should be restricted so that the edg e bias described 
in Section[8]is avoided. We applied to the lPW09l sample the same 
velocity cut as for our sample and find, as expected, that both 
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Fig. 14. Cosmological mass density of neutral gas contained 
in systems of different column densities. The dashed-green and 
red curves correspond respectively to the Double power-law 
and r-function fits to /hi(A^,^) in SDSS DR7, while the blue 
sohd curve represen ts the F-function fit to /hi(A^,X)(z - 0) 
dZwaan et al.ll2005bl) . 



samples yield similar results (see Fig. \T5[ . The differences be- 
tween the measurements from the two SDSS samples are within 
statistical error bars and are mainly due to slightly higher com- 
pleteness and larger number statistics of our sample. The imme- 
diate implication of the edge-effect correction is that it cannot be 
claimed that Qg^^ does not evolve from z = 2.2 to z = 0. 

On the other hand, we also note that a decrease of Q.^^^ 
with decreasing redshift cannot be excluded from the sample 
of Peroux et al. (2003). Indeed, a different binning of their data 
shows that they are actually consistent with a decrease of Q^^^ 
over the redshift range z ~ 3.2 - 2.2 (see Fig. [TSll. In addition, 
while the sample from iPeroux et alj is large enough to obtain a 
reasonable measurement of Q°^^ at high redshift, it is probably 
too small to infer strong conclusions on its evolution. Fig. [16] 
illustrates the effect of the sample size on the determination of 
Qg We construct randomly selected SDSS QSO sub-samples 
of increasing total path length, AX, and calculate the ratio O/Qq 
of Qg values from the subsample and the whole SDSS sample. 
Note tha t the survey used in the present paper has AX =11 099 
whereas iProchaska & Wolfa s surve y has AX - 8 475 (apply- 
ing the new definition of Zmin) and iPeroux et alj s survey has 
AX = 1 540. 

A ll this reconc iles the SDSS measurements with the results 
from IPeroux et al.i (see Fig. [TTb . It can be seen also that the 
points at z ~ 2 are co nsistent with tho se at lower redshift given 
their large error bars (iRao et alJl2006h . Note also that the value 
we derive at z ~ 2.4 is consistent with that obtained from the 
Hambu rg-ESO survey (flj? ^^ ~ 1, A. Smette, private communi- 
cation; iSmette et al.ll2005l) . 

At redshifts above z = 3.2, there is still a discrepancy be- 
tween the results from SDSS and that from previous surveys (see 
Fig. [TSl l. As pointed out by Prochaska et al. (2005), the size of 
the samples prior to SDSS were insufficient to detect the conver- 
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gence of OP^^ at large A^(H i). Therefore, the inclusion of a sin- 



Fig. 15. The cosmological mass density of neutral gas in DLAs, 
f2°^^, as a function of redshift for z > 1.65. Vertical error bars 
represent the 1 cr uncertainties on the measurements while hori- 
zontal error bars represent the redshift bins under consideration. 
Black error bars are our meas uremen ts at z > 2.2. Orange er- 
ror bars are derived from the |PW09| sample when applying a 
^v = 10000 kms"' cutoff to correct for the edge-effect (see 
Section l4~2l and Eq.|5]l. The uncorrected values derived from the 
same sample are shown in dashed oran ge error bars. The g reen 
squares are derived from the sample of IPeroux et al.l (l2003l) us- 
ing a different binning compared to that adopted by these au- 
thors. Finally, the hashed r egion re present the 1 cr range on Q^^^ 
at z = from lZwaan et al.l (l2005bl) . We note also that the amount 
of bar yons in stars at z = is = (2.5 + 1 .3) x 10"^ dCole et al.l 
I2OOTI) . 



gle large column density system in these samples could change 
the results on Q^^^ significantly. In turn, the low spectral reso- 
lution of SDSS combined with a dense Ly-a forest could lead to 
slightly overestimate the column densities of high-z DLAs. 

The measurement of the cosmological mass density of neu- 
tral gas at intermediate and low redshifts is a difficult task. The 
little incidence of DLAs and the need for ob servations from 
space have lead to s amples of limited sizes. Rao & Turnshe^ 
(2000) and lRao et al.l (I2OO61) used a novel technique to measure 
at 0.1 < z < 1.6. They searched for the Ly-a absorp- 
tion associated to Mgn systems, which statistics is very large 
at those redshifts. The corresponding values of Q^^^ are high 
and therefore have been extensively discussed in the literature. 
In particular, PW09. concluded that the values obtained at z ~ 1 
bv lRao et al.l (l2006h are difficult to reconcile with the value they 
obtained at z ~ 2.2 and suffer from a statistical fluke or an ob- 
servational bias. After correcting for the edge bias we discussed 
earlier it can b e seen that the S DSS results are no more incompat- 
ible with the .Rao et al.l(l2006l) result s. Howeve r, it is still possible 
that the high values of from iRao et all are overestimated 
due to systematics associated with the selection of low and in- 
termediat e redshi ft DLAs dir ectly from strong Mg 11 abso rption 
(see Pe roux et al.l l2004; Des sauges-Zavadskv et al.ll2009b . Only 
a large blind survey for DLAs at z ~ 1 could solve this issue. 
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Fig. 16. Statistical uncertainty on the determination of as 
a function of the sample size. Each circle represents the ratio be- 
tween the measurement of D^^^ for a randomly selected sample 
of total absorption pathlength AX (Q) and that obtained from the 
full SDSS sam ple, Q.„. Note t hat AX ^ 1 540, 8 475 and 1 1 099 
in, respectivelv.'P eroux et al] (l2003h . lProchaska & Wolf^ (l2009l 
after correcting for the edge-bias) and the present work. 

Table 2. The cosmological mass density of neutral gas in DLAs 
from this work (SDSS-DR7) 



z 


(z)" 


AX 


ni*'-* (xlO"') 


2.23-2.60 


2.44 


2774 


0.82+0.09 


2.60-2.88 


2.74 


2774 


0.85±0.09 


2.88-3.20 


3.02 


2774 


1.03+0.10 


3.20-5.19 


3.49 


2774 


1.29+0.15 



" median redshift corresponding to half the total pathlength AX in the 
redshift bin. 



Note also that Q^^^(z ~ 2), obtained from the sample of 
iPeroux et all ('2003"), is almost equal to Q.°^^(z ~ 2.2), ob- 
tained here from SDSS-DR7. This could indicate a flattening of 
^^^^{z) at z ~ 2. Large intermediate-redshift optical and radio 
surveys are therefore still required to constr ain the e volution of 
qDla ^ _ 2 to z = (see discussion in iGupta et al. 2009) . 

5. Conclusion 

We have demonstrated the feasibility and the robustness of a 
fully automatic search of SDSS-DR7 for DLA systems based on 
the identification of DLA profiles by correlation analysis. This 
led to the identification of about one thousand DLAs, represent- 
ing the largest DLA database to date. We tested the accuracy of 
the A^(H i) measurements and quantified the high level of com- 
pleteness and reliability of the detections. 

In agreement with previous studies ( Peroux et^ 120031; 
iProchaska et alJl2005l: l^ochaska & Woif3l2009 '). we show that 
a single power-law is a poor description of the A^(H i)-frequency 
distribution at logA^(Hi) > 20.3. A double power-law or a F 
function give better fits. The finding of one logA^(Hi) - 22 
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Fig. 17. Cosmological mass density of neutral gas in DLAs, 
Cl^^^, as a function of reds hift. The red triangle at z = is the 
value from 21 -cm maps bV iZwaan et alJ (l2005b) . The blue filled 
circles at z ~ 1 are the measurements of D^^^ from iRao et al.l 
("2006). The green square at z ~ 2 is derived from the sample 
of Peroux et al. (2003). Measurements at z > 2.2 are from the 
present work base on SDSS DR7. 



DLA, confirmed by UVES high spectral resolution observations, 
shows that the slope of fui(N,X) at logA^(Hi) > 21.5 is -3.5. 

The convergence of Oj?^^ for large A^(H i) indicates that the 
cosmological mass density of neutral gas at z ~ 2.2 - 5 is domi- 
nated by bona-fide damped Lyman-a systems. The relative con- 
tribution of DLAs reaches its maximum around log A^(H i) - 21, 
similar to what is observed in the local Universe. The paucity 
of very high-column density DLAs implies that they contribute 
for only a small fraction to the cosmological mass density of 
neutral gas. On the other hand, an extrapolation of fni{N,X) at 
logA^(Hi) < 20.3 suggests that sub-DLA systems contribute to 
about one fifth of the ne utral hydrogen at hig h redshift, in agree- 
ment with the results of lPeroux et al.l (l2005l) . 

We identified an important observational bias due to an 
edge effect and proposed a method to avoid it. Such a bias 
could also partly explain the higher values of Qg'''^ found by 
iProchaska et al. ( 2005 ) when selecting only bright quasars, as 
the bias discussed here preferentially affects faint quasars with 
lower signal-to-noise ratios. Indeed, when not correcting for the 
bias, we find 10% higher Q^^'^ from a bright QSO sub-sample 
(/ < 19.5) compared to a faint QSO sub-sample (/ > 19.5) while 
this difference is only 5% when the bias is avoided. 

We derive the evolution with time of the cosmological mass 
density of neutral gas in Fig. [T7] and summarise our measure- 
ments in Table |2l We observe a decrease with time of the cos- 
mological mass density of neutral gas b etween z ~ 3.2 and 
z ~ 2.2, confirming the re sults from .Prochaska et alj (l2005 l also 
IProchaska & Wolfell2009l) . However, we argue that the value at 
z ~ 2.2 is significantly higher (by up to a factor of two) than the 
value at z = 0, indicating that fig^^ keeps evolving at z < 2.2. 
Inte restingly, models of the evolution of the reservoir of neutral 
gas dHopkins et al.ll2008l) also predict a value of OP'-^ at z ~ 2.2 
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higher than that at z = 0. The small number statistics at high red- 
shift and the insufficient spectral resolution of SDSS spectra do 
not allow for a strong conclusion on the neutral gas content of 
the Universe at z ~ 4 - 5. Further surve ys are therefore required 
at z > 4 (see e.g. lGuimaraes et al ] l2009h . 

We measure Q^^^(z ~ 3) ^ 10"^. This implies that neu- 
tral gas accounts for only 2% of the baryons at high redshift, 
according to the lates t cosmological parameters from WMAP 
jKomatsu et alj l2009l) . This implies that most of the baryons 
are in the forrn of io nised gas in the intergalactic medium (e.g. 
iPetitieanet aLlll993l) . 

The amount of baryons locked u p into stars at z = (Q* = 
(2.5 + 1.3) X 10'^: IColeetal .112001 1) is about twice the amount 
of neutral gas contained in high redshift DLAs. This implies that 
the DLA phase must be replenished in gas before the present 
epoc h (see als o PW09) at a rate similar to that of its consump- 
tion (iHopkins et al.l 12008b . This is also required to explain the 
properties of z = 2 - 3 Lyman-break galaxies (,Erb.2008) . The 
replenishment of Hi gas could take place through the accre- 
tion of matter from the intergalactic medium and/or recombina- 
tion of ionised gas in the walls of supershells. Several observa- 
tional evidences of cold gas accr etion at high redshift have been 
published recently (e.g.,Nilsson etaLll2006i iDiikstra et alj|2006t 
iNoterdaeme et al.ll2008bh . On the other hand, supershells pro- 
vide a natural explanation to the proportionality between star for- 
mation and replenishment rate (Hopkins et al. 2008). The results 
presented here provide strong constraints for nu merical mod- 
elling of hierarchical evolution of galaxies (e.g. Pontzen et al.l 
|2008|) . Note that galactic winds are likely to play an important 
role in the evolution of the cosmological mass density of neutral 
gas (Tescari et al. 2009). 

Finally, it has long been discussed whether the optically 
selected quasar samples are affected by extinction due to 
the presence of dust on the line of s ight jBoisse et alj|1998t 
lEUison et al.1 120011; ISmette et al.1 '2005'). We recently presented 
direct evidence that hnes of sight towards colour-selected 
quasars are biased against the detection of diffuse mol ecular 
clouds (I Noterdaeme et al.ll2009l) . lPontzen & Pettinil ( 120091) esti- 
mated that dust-biasing could lead to underestimate the metal 
budget by about 50%. Although it has been claimed that the 
global c ensus of neutral gas should be little affecte d by dust- 
biasing dElhson et alj|2008t iTrenti & StiavelUll2006l) . it will be 
interesting to revisit this issue. 
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